This manual describes the programming language Cool: the Classroom Object-Oriented Language. Cool is a small language that can be implemented with reasonable effort in a one semester course. Still, Cool retains many of the features of modern programming languages including objects, static typing, and automatic memory management.
Cool programs are sets of classes. A class encapsulates the variables and procedures of a data type. Instances of a class are objects. In Cool, classes and types are identified. That is, every class defines a type. Classes permit programmers to define new types and associated procedures (or methods) specific to those types. Inheritance allows new types to extend the behavior of existing types.
Cool is an expression language. Most Cool constructs are expressions, and every expression has a value and a type. Cool is type safe: procedures are guaranteed to be applied to data of the correct type. While static typing imposes a strong discipline on programming in Cool, it guarantees that no runtime type errors can arise in the execution of Cool programs.
This manual is divided into informal and formal components. For a short, informal overview, the first half (through Section 9) suffices. The formal description begins with Section 10.
Cool source files have extension .cl. The programming projects will also define other file formats related to Cool but they are not officially part of the language specification.
You can obtain the Cool interpreter cool from the course website. To interpret (i.e., run) a Cool program:
cool file.cl
This official version is often called the reference implementation and you are encouraged to use it as a point of comparison when you are designing and testing parts of the course project. The reference Cool interpreter has been specifically structured so that you can run the various stages (i.e., lexing, parsing, type-checking and interpreting) independently. This power is useful for PA2 through PA5.
The Cool interpreter has an number of command-line options:
You may encounter other University uses of Cool on the web that mention programs such as \rm coolc and \rm spim. Those tool are used for a course on compilers; this is a course on interpreters. We will not use \rm coolc or \rm spim. In addition, we use a slightly different version of the Cool language specification, so comparing results against external tools may not be helpful.
All code in Cool is organized into classes. Each class definition must be contained in a single source file, but multiple classes may be defined in the same file. Class definitions have the form:
class <type> [ inherits <type> ] { <feature_list> };
The notation [ ... ] denotes an optional construct. All class names are globally visible. Class names begin with an uppercase letter. Classes may not be redefined.
The body of a class definition consists of a list of feature definitions. A feature is either an \it attribute or a \it method. An attribute of class \rm A specifies a variable that is part of the state of objects of class \rm A. A method of class \rm A is a procedure that may manipulate the variables and objects of class \rm A.
One of the major themes of modern programming languages is \it information\ hiding, which is the idea that certain aspects of a data type's implementation should be abstract and hidden from users of the data type. Cool supports information hiding through a simple mechanism: all attributes have scope local to the class, and all methods have global scope. Thus, the only way to provide access to object state in Cool is through methods.
Feature names must begin with a lowercase letter. No method name may be defined multiple times in a class, and no attribute name may be defined multiple times in a class, but a method and an attribute may have the same name.
A fragment from \rm list.cl illustrates simple cases of both attributes and methods:
class Cons inherits List { xcar : Int; xcdr : List; isNil() : Bool { false }; init(hd : Int, tl : List) : Cons { { xcar <- hd; xcdr <- tl; self; } }; ... };
In this example, the class \rm Cons has two attributes \rm xcar and \rm xcdr and two methods \rm isNil and \rm init. Note that the types of attributes, as well as the types of formal parameters and return types of methods, are explicitly declared by the programmer.
Given object \rm c of class \rm Cons and object \rm l of class \rm List, we can set the \rm xcar and \rm xcdr fields by using the method \rm init:
c.init(1,l)
This notation is \it object \HY oriented\ dispatch. There may be many definitions of \rm init methods in many different classes. The dispatch looks up the class of the object \rm c to decide which \rm init method to invoke. Because the class of \rm c is \rm Cons, the \rm init method in the \rm Cons class is invoked. Within the invocation, the variables \rm xcar and \rm xcdr refer to \rm c's attributes. The special variable \rm self refers to the object on which the method was dispatched, which, in the example, is \rm c itself.
There is a special form \rm new\ C that generates a fresh object of class \rm C. An object can be thought of as a record that has a slot for each of the attributes of the class as well as pointers to the methods of the class. A typical dispatch for the \rm init method is:
(new Cons).init(1,new Nil)
This example creates a new cons cell and initializes the "car" of the cons cell to be 1 and the "cdr" to be \rm new\ Nil.(2) There is no mechanism in Cool for programmers to deallocate objects. Cool has \it automatic\ memory\ management; objects that cannot be used by the program are deallocated by a runtime garbage collector.
Attributes are discussed further in Section 5 and methods are discussed further in Section 6.
If a class definition has the form
class C inherits P { ... };
then class \rm C inherits the features of \rm P. In this case P is the \it parent class of \rm C and \rm C is a \it child class of \rm P. The semantics of \rm C\ inherits\ P is that \rm C has all of the features defined in \rm P in addition to its own features. In the case that a parent and child both define the same method name, then the definition given in the child class takes precedence. It is illegal to redefine attribute names. Furthermore, for type safety, it is necessary to place some restrictions on how methods may be redefined (see Section 6).
There is a distinguished class \rm Object. If a class definition does not specify a parent class, then the class inherits from \rm Object by default. A class may inherit only from a single class; this is aptly called "single inheritance."(3) The parent-child relation on classes defines a graph. This graph may not contain cycles. For example, if \rm C inherits from \rm P, then \rm P must not inherit from \rm C. Furthermore, if \rm C inherits from \rm P, then \rm P must have a class definition somewhere in the program. Because Cool has single inheritance, it follows that if both of these restrictions are satisfied, then the inheritance graph forms a tree with \rm Object as the root.
In addition to \rm Object, Cool has four other \it basic\ classes: \rm Int, \rm String, \rm Bool, and \rm IO. The basic classes are discussed in Section 8.
In Cool, every class name is also a type. In addition, there is a type \rm SELF\_TYPE that can be used in special circumstances.
A \it type\ declaration has the form \rm x:C, where \rm x is a variable and \rm C is a type. Every variable must have a type declaration at the point it is introduced, whether that is in a \rm let, \rm case, or as the formal parameter of a method. The types of all attributes must also be declared.
The basic type rule in Cool is that if a method or variable expects a value of type \rm P, then any value of type \rm C may be used instead, provided that \rm P is an ancestor of \rm C in the class hierarchy. In other words, if \rm C inherits from \rm P, either directly or indirectly, then a \rm C can be used wherever a \rm P would suffice.
When an object of class \rm C may be used in place of an object of class \rm P, we say that \rm C \it conforms to \rm P or that \tt C \le P (think: \rm C is lower down in the inheritance tree). As discussed above, conformance is defined in terms of the inheritance graph.
\bf Definition\ 4.1 (Conformance) Let \tt A,C, and \tt P be types.
Because \rm Object is the root of the class hierarchy, it follows that \tt A \leq Object for all types \tt A.
The type \rm SELF\_TYPE is used to refer to the type of the \rm self variable. This is useful in classes that will be inherited by other classes, because it allows the programmer to avoid specifying a fixed final type at the time the class is written. For example, the program
class Silly { copy() : SELF_TYPE { self }; }; class Sally inherits Silly { }; class Main { x : Sally <- (new Sally).copy(); main() : Sally { x }; };
Because \rm SELF\_TYPE is used in the definition of the \rm copy method, we know that the result of \rm copy is the same as the type of the \rm self parameter. Thus, it follows that \rm (new Sally).copy() has type \rm Sally, which conforms to the declaration of attribute \rm x.
Note that the meaning of \rm SELF\_TYPE is not fixed, but depends on the class in which it is used. In general, \rm SELF\_TYPE may refer to the class \rm C in which it appears, or any class that conforms to \rm C. When it is useful to make explicit what \tt SELF\_TYPE may refer to, we use the name of the class \rm C in which \rm SELF\_TYPE appears as an index \tt SELF\_TYPE_{C}. This subscript notation is not part of Cool syntax--it is used merely to make clear in what class a particular occurrence of \rm SELF\_TYPE appears.
From Definition 4.1, it follows that \tt SELF\_TYPE_{X} \leq SELF\_TYPE_{X}. There is also a special conformance rule for \rm SELF\_TYPE:
\tt SELF\_TYPE_{C} \leq P\ \ if\ \ C \leq PFinally, \rm SELF\_TYPE may be used in the following places: \rm new\ SELF\_TYPE, as the return type of a method, as the declared type of a \rm let variable, or as the declared type of an attribute. No other uses of \rm SELF\_TYPE are permitted.
The Cool type system guarantees at compile time that execution of a program cannot result in runtime type errors. Using the type declarations for identifiers supplied by the programmer, the type checker infers a type for every expression in the program.
It is important to distinguish between the type assigned by the type checker to an expression at compile time, which we shall call the \it static type of the expression, and the type(s) to which the expression may evaluate during execution, which we shall call the \it dynamic types.
The distinction between static and dynamic types is needed because the type checker cannot, at compile time, have perfect information about what values will be computed at runtime. Thus, in general, the static and dynamic types may be different. What we require, however, is that the type checker's static types be \it sound with respect to the dynamic types.
\bf Definition\ 4.2 For any expression \tt e, let \tt D_{e} be a dynamic type of \tt e and let \tt S_{e} be the static type inferred by the type checker. Then the type checker is \it sound if for all expressions \tt e it is the case that \tt D_{e} \leq S_{e}.
Put another way, we require that the type checker err on the side of overestimating the type of an expression in those cases where perfect accuracy is not possible. Such a type checker will never accept a program that contains type errors. However, the price paid is that the type checker will reject some programs that would actually execute without runtime errors.
An attribute definition has the form
<id> : <type> [ <- <expr> ];
The expression is optional initialization that is executed when a new object is created. The static type of the expression must conform to the declared type of the attribute. If no initialization is supplied, then the default initialization is used (see below).
When a new object of a class is created, all of the inherited and local attributes must be initialized. Inherited attributes are initialized first in inheritance order beginning with the attributes of the greatest ancestor class. Within a given class, attributes are initialized in the order they appear in the source text.
Attributes are local to the class in which they are defined or inherited. Inherited attributes cannot be redefined.
All variables in Cool are initialized to contain values of the appropriate type. The special value \rm void is a member of all types and is used as the default initialization for variables where no initialization is supplied by the user. (\rm void is used where one would use \rm NULL in C or \rm null in Java; Cool does not have anything equivalent to C's or Java's \rm void type.) Note that there is no name for \rm void in Cool; the only way to create a \rm void value is to declare a variable of some class other than \rm Int, \rm String, or \rm Bool and allow the default initialization to occur, or to store the result of a \rm while loop.
There is a special form \rm isvoid\ expr that tests whether a value is \rm void (see Section 7.11). In addition, \rm void values may be tested for equality. A \rm void value may be passed as an argument, assigned to a variable, or otherwise used in any context where any value is legitimate, except that a dispatch to or case on \rm void generates a runtime error.
Variables of the basic classes \rm Int, \rm Bool, and \rm String are initialized specially; see Section 8.
A method definition has the form
<id>(<id> : <type>,...,<id> : <type>): <type> { <expr> };
There may be zero or more formal parameters. The identifiers used in the formal parameter list must be distinct. The type of the method body must conform to the declared return type. When a method is invoked, the formal parameters are bound to the actual arguments and the expression is evaluated; the resulting value is the meaning of the method invocation. A formal parameter hides any definition of an attribute of the same name.
To ensure type safety, there are restrictions on the redefinition of inherited methods. The rule is simple: If a class \rm C inherits a method \rm f from an ancestor class \rm P, then \rm C may override the inherited definition of \rm f provided the number of arguments, the types of the formal parameters, and the return type are exactly the same in both definitions.
To see why some restriction is necessary on the redefinition of inherited methods, consider the following example:
class P { f(): Int { 1 }; }; class C inherits P { f(): String { "1" }; };
Let \rm p be an object with dynamic type \rm P. Then
p.f() + 1
is a well-formed expression with value 2. However, we cannot substitute a value of type \rm C for \rm p, as it would result in adding a string to a number. Thus, if methods can be redefined arbitrarily, then subclasses may not simply extend the behavior of their parents, and much of the usefulness of inheritance, as well as type safety, is lost.
Expressions are the largest syntactic category in Cool.
The simplest expressions are constants. The boolean constants are \rm true and \rm false. Integer constants are unsigned strings of digits such as \rm 0, \rm 123, and \rm 007. String constants are sequences of characters enclosed in double quotes, such as \rm \LQT This\ is\ a\ string. \RQT String constants may be at most 1024 characters long. There are other restrictions on strings; see Section 10.
The constants belong to the basic classes \rm Bool, \rm Int, and \rm String. The value of a constant is an object of the appropriate basic class.
The names of local variables, formal parameters of methods, \rm self, and class attributes are all expressions. The identifier \rm self may be referenced, but it is an error to assign to \rm self or to bind \rm self in a \rm let, a \rm case, or as a formal parameter. It is also illegal to have attributes named \rm self.
Local variables and formal parameters have lexical scope. Attributes are visible throughout a class in which they are declared or inherited, although they may be hidden by local declarations within expressions. The binding of an identifier reference is the innermost scope that contains a declaration for that identifier, or to the attribute of the same name if there is no other declaration. The exception to this rule is the identifier \rm self, which is implicitly bound in every class.
An assignment has the form
<id> <- <expr>
The static type of the expression must conform to the declared type of the identifier. The value is the value of the expression. The static type of an assignment is the static type of \rm <expr>.
There are three forms of dispatch (i.e. method call) in Cool. The three forms differ only in how the called method is selected. The most commonly used form of dispatch is
<expr>.<id>(<expr>,...,<expr>)
Consider the dispatch \tt e_0.f(e_1,\ldots,e_n). To evaluate this expression, the arguments are evaluated in left-to-right order, from \tt e_1 to \tt e_n. Next, \tt e_0 is evaluated and its class \rm C noted (if \tt e_0 is \rm void a runtime error is generated). Finally, the method \rm f in class \rm C is invoked, with the value of \tt e_0 bound to \rm self in the body of \rm f and the actual arguments bound to the formals as usual. The value of the expression is the value returned by the method invocation.
Type checking a dispatch involves several steps. Assume \tt e_0 has static type A. (Recall that this type is not necessarily the same as the type \rm C above. \rm A is the type inferred by the type checker; \rm C is the class of the object computed at runtime, which is potentially any subclass of \rm A.) Class \rm A must have a method \rm f, the dispatch and the definition of \rm f must have the same number of arguments, and the static type of the ith actual parameter must conform to the declared type of the ith formal parameter.
If \rm f has return type \rm B and \rm B is a class name, then the static type of the dispatch is \rm B. Otherwise, if \rm f has return type \rm SELF\_TYPE, then the static type of the dispatch is \rm A. To see why this is sound, note that the \rm self parameter of the method \rm f conforms to type \rm A. Therefore, because \rm f returns \rm SELF\_TYPE, we can infer that the result must also conform to \rm A. Inferring accurate static types for dispatch expressions is what justifies including \rm SELF\_TYPE in the Cool type system.
The other forms of dispatch are:
<id>(<expr>,...,<expr>) <expr>@<type>.id(<expr>,...,<expr>)
The first form is shorthand for \rm self.<id>(<expr>,...,<expr>).
The second form provides a way of accessing methods of parent classes that have been hidden by redefinitions in child classes. Instead of using the class of the leftmost expression to determine the method, the method of the class explicitly specified is used. For example, \rm e@B.f() invokes the method \rm f in class \rm B on the object that is the value of \rm e. For this form of dispatch, the static type to the left of "@" must conform to the type specified to the right of "@".
A conditional has the form
if <expr> then <expr> else <expr> fi
The semantics of conditionals is standard. The predicate is evaluated first. If the predicate is \rm true, then the \rm then branch is evaluated. If the predicate is \rm false, then the \rm else branch is evaluated. The value of the conditional is the value of the evaluated branch.
The predicate must have static type \rm Bool. The branches may have any static types. To specify the static type of the conditional, we define an operation \sqcup (pronounced "join") on types as follows. Let \rm A,B,D be any types other than \rm SELF\_TYPE. The \it least\ type of a set of types means the least element with respect to the conformance relation \leq.
\begin{eqnarray*} \tt A \sqcup B &=& \tt the\ least\ type\ C\ such\ that\ A \leq C\ and\ B \leq C \\ \tt A \sqcup A &=& \tt A\ \hphantom{the least}(idempotent) \\ \tt A \sqcup B &=& \tt B \sqcup A\ \hphantom{the le}(commutative) \\ \tt SELF\_TYPE_{D} \sqcup A &=& \tt D \sqcup A \end{eqnarray*}Let \rm T and \rm F be the static types of the branches of the conditional. Then the static type of the conditional is \tt T \sqcup F. (think: Walk towards \rm Object from each of \rm T and \rm F until the paths meet.)
A loop has the form
while <expr> loop <expr> pool
The predicate is evaluated before each iteration of the loop. If the predicate is \rm false, the loop terminates and \rm void is returned. If the predicate is \rm true, the body of the loop is evaluated and the process repeats.
The predicate must have static type \rm Bool. The body may have any static type. The static type of a loop expression is \rm Object.
A block has the form
{ <expr>; ... <expr>; }
The expressions are evaluated in left-to-right order. Every block has at least one expression; the value of a block is the value of the last expression. The expressions of a block may have any static types. The static type of a block is the static type of the last expression.
An occasional source of confusion in Cool is the use of semi-colons (";"). Semi-colons are used as terminators in lists of expressions (e.g., the block syntax above) and not as expression separators. Semi-colons also terminate other Cool constructs, see Section 11 for details.
A let expression has the form
let <id1> : <type1> [ <- <expr1> ], ..., <idn> : <typen> [ <- <exprn> ] in <expr>
The optional expressions are \it initialization; the other expression is the \it body. A \rm let is evaluated as follows. First \rm <expr1> is evaluated and the result bound to \rm <id1>. Then \rm <expr2> is evaluated and the result bound to \rm <id2>, and so on, until all of the variables in the \rm let are initialized. (If the initialization of \rm <idk> is omitted, the default initialization of type \rm <typek> is used.) Next the body of the \rm let is evaluated. The value of the \rm let is the value of the body.
The \rm let identifiers \rm <id1>,...,<idn> are visible in the body of the \rm let. Furthermore, identifiers \rm <id1>,...,<idk> are visible in the initialization of \rm <idm> for any \rm m \gt k.
If an identifier is defined multiple times in a \rm let, later bindings hide earlier ones. Identifiers introduced by \rm let also hide any definitions for the same names in containing scopes. Every \rm let expression must introduce at least one identifier.
The type of an initialization expression must conform to the declared type of the identifier. The type of \rm let is the type of the body.
The \rm <expr> of a \rm let extends as far (encompasses as many tokens) as the grammar allows.
A case expression has the form
case <expr0> of <id1> : <type1> => <expr1>; ... <idn> : <typen> => <exprn>; esac
Case expressions provide runtime type tests on objects. First, \rm expr0 is evaluated and its dynamic type \rm C noted (if \rm expr0 evaluates to \rm void a run-time error is produced). Next, from among the branches the branch with the least type \rm <typek> such that \rm C \leq <typek> is selected. The identifier \rm <idk> is bound to the value of \rm <expr0> and the expression \rm <exprk> is evaluated. The result of the \rm case is the value of \rm <exprk>. If no branch can be selected for evaluation, a run-time error is generated. Every \rm case expression must have at least one branch.
For each branch, let \tt T_i be the static type of \rm <expri>. The static type of a \rm case expression is \tt\bigsqcup_{1 \leq i \leq n} T_i. The identifier \rm id introduced by a branch of a \rm case hides any variable or attribute definition for \rm id visible in the containing scope.
The \rm case expression has no special construct for a "default" or "otherwise" branch. The same affect is achieved by including a branch
x : Object => ...
because every type is \leq to \rm Object.
The \rm case expression provides programmers a way to insert explicit runtime type checks in situations where static types inferred by the type checker are too conservative. A typical situation is that a programmer writes an expression \tt e and type checking infers that \tt e has static type \tt P. However, the programmer may know that, in fact, the dynamic type of \tt e is always \tt C for some \tt C \leq P. This information can be captured using a case expression:
case e of x : C => ...
In the branch the variable \tt x is bound to the value of \tt e but has the more specific static type \tt C.
A \rm new expression has the form
new <type>
The value is a fresh object of the appropriate class. If the type is \rm SELF\_TYPE, then the value is a fresh object of the class of \rm self in the current scope. The static type is \rm <type>.
The expression
isvoid expr
evaluates to \rm true if \rm expr is \rm void and evaluates to \rm false if \rm expr is not \rm void.
\bf Binary\ Arithmetic
Cool has four binary arithmetic operations: \rm +,\ -,\ *,\ /. The syntax is
expr1 <op> expr2
To evaluate such an expression first \rm expr1 is evaluated and then \rm expr2. The result of the operation is the result of the expression.
The static types of the two sub-expressions must be \rm Int. The static type of the entire arithmetic expression is also \rm Int. Cool has only integer division.
\bf Binary\ Relations
Cool has three comparison operations: \rm <,\ <=,\ =. These comparisons may be applied to subexpressions of any types, subject to the following rules:
In all cases, the result of the comparison is a \rm Bool. See the type checking rules for more information.
In principle, there is nothing wrong with permitting equality tests between, for example, \rm Bool and \rm Int. However, such a test must always be false and almost certainly indicates some sort of programming error. The Cool type checking rules catch such errors at compile-time instead of waiting until runtime.
On non-basic objects, equality is decided via pointer equality (i.e., whether the memory addresses of the objects are the same). Equality is defined for \rm void: two \rm void values are equal and a \rm void value is never equal to a non-\rm void value. See the operational semantics rules for more informaiton.
\bf Unary\ Expressions
Finally, there is one unary arithmetic and one unary logical operator.
The \rm Object class is the root of the inheritance graph. Even the other basic classes (e.g., \rm IO and \rm Int) inherit from \rm Object (and thus inherit the three methods listed below). It is an error to redefine \rm Object. Methods with the following declarations are defined:
abort() : Object type_name() : String copy() : SELF_TYPE
The method \rm abort flushes all output and then halts program execution with the error message \rm \LQT abort \backslash n \RQT.
The method \rm type\_name returns a string with the name of the (run-time, dynamic) class of the object.
The method \rm copy produces a \it shallow copy of the object.(4)
The \rm IO class provides the following methods for performing simple input and output operations:
out_string(x : String) : SELF_TYPE out_int(x : Int) : SELF_TYPE in_string() : String in_int() : Int
The methods \rm out\_string and \rm out\_int print their argument, flush the standard output, and return their \rm self parameter.
The interpreter or compiler changes every \rm \backslash t to a tab and every \rm \backslash n to a newline in the argument \rm x to \rm out\_string before emitting the resulting string. Note that this is different from normal escape sequence handling, where \rm \backslash n would be a single character stored in the string. In Cool, it is two characters, but \rm out\_string prints a newline instead of \rm \backslash n.
The method \rm in\_string reads a string from the standard input, up to but not including a newline character or the end of file. The newline character is consumed but is not made part of the returned string. If an error occurs then \rm in\_string returns \rm \LQT \RQT, the string of length 0. Note that while literal lexical string constants are limited to size 1024, strings generated by \rm in\_string (or \rm String.concat, etc.) can be of arbitrary size. There is no special processing of the two-character sequences \rm \backslash t or \rm \backslash n (or, indeed \it \backslash anything) during \rm in\_string. Errors include:
The method \rm in\_int reads a single possibly-signed integer, which may be preceded by whitespace. Any characters following the integer, up to and including the next newline, are discarded by \rm in\_int. If an error occurs then \rm in\_int returns 0. Errors include:
A class can make use of the methods in the \rm IO class by inheriting from \rm IO. It is an error to redefine the \rm IO class.
The \rm Int class provides integers. There are no methods special to \rm Int. The default initialization for variables of type \rm Int is 0 (not \rm void). It is an error to inherit from or redefine \rm Int.
The \rm String class provides strings. The following methods are defined:
length() : Int concat(s : String) : String substr(i : Int, l : Int) : String
The method \rm length returns the length of the \rm self parameter. The method \rm concat returns the string formed by concatenating \rm s after \rm self. The method \rm substr returns the substring of its \rm self parameter beginning at position \rm i with length \rm l; string positions are numbered beginning at 0. A runtime error is generated if the specified substring is out of range. Substring errors are always reported as taking place on line 0.
The default initialization for variables of type \rm String is \rm \LQT \RQT (not \rm void). It is an error to inherit from or redefine \rm String.
The \rm Bool class provides \rm true and \rm false. The default initialization for variables of type \rm Bool is \rm false (not \rm void). It is an error to inherit from or redefine \rm Bool.
Every program must have a class \rm Main. Furthermore, the \rm Main class must have a method \rm main that takes no formal parameters. The \rm main method may be defined in class \rm Main or it may be inherited from another class. A program is executed by evaluating \rm (new Main).main().
The remaining sections of this manual provide a more formal definition of Cool. There are four sections covering lexical structure (Section 10), grammar (Section 11), type rules (Section 12), and operational semantics (Section 13).
The lexical units of Cool are integers, type identifiers, object identifiers, special notation, strings, keywords, and white space.
Integers are non-empty strings of digits 0-9. It is a lexer error if a literal integer constant is too big to be represented as a 32-bit signed integer. 32-bit signed integers range from -2,147,483,648 to +2,147,483,647. Cool integer constants are always non-negative, so valid integer constants range from 0 to 2,147,483,647.
Identifiers are strings (other than keywords) consisting of letters, digits, and the underscore character. Type identifiers begin with a capital letter; object identifiers begin with a lower case letter. Identifiers \it are case sensitive.
\bf self and \bf SELF\_TYPE are treated specially by Cool but are not treated as keywords. \bf self should be reported by the lexer as an identifier and \bf SELF\_TYPE should be reported by the lexer as a type. Both \it are case sensitive.
The special syntactic symbols (e.g., parentheses, assignment operator, etc.) are given in Figure 1.
Strings are enclosed in double quotes \rm \LQT ... \RQT. Within a string, a sequence '\c' denotes the two characters '\' and 'c', with the exception of the following:
\t tab \n newline
The two-character sequences \rm \backslash n and \rm \backslash t are called \it escape\ sequences. Other escape sequences like \rm \backslash r (carriage return) are not part of Cool. These two special escape sequences should not be interpreted or transformed by the lexer; they are handled by the \rm IO module and the run-time system.
A newline character may not appear in a string:
"This is not OK"
A string may contain embedded double quotes, so long as they are escaped. The following is a valid Cool string:
"David St. Hubbins said, \"It's such a fine line between stupid, and clever.\""
Note that Cool's interpretation of \rm \backslash " may not be what you are expecting. The two-character sequence \rm \backslash " (which is not an escape sequence) does not become \rm " in any sense. Instead, it stays \rm \backslash ". This is different from most other languages, but simplifies lexing and interpreting. Example:
class Main inherits IO { main() : Object { out_string("She said, \"Hello.\"\n") } ; } ;
A string may not contain EOF; strings cannot cross file boundaries. A string may not contain NUL, the character with ASCII value 0. The lexer must reject source text that contains malformed strings.
A string may contain the two-character sequence \rm \backslash 0 (backslash zero). However, that sequence does not have any special meaning -- it just yields a backslash followed by a zero inside the string.
The single character with converted integer value zero (the NUL) is not allowed. Any other character may be included in a string.
There are two forms of comments in Cool. Any characters between two dashes \rm \LQT \HY \RQT and the next newline (or EOF, if there is no next newline) are treated as comments. Comments may also be written by enclosing text in (\ast \ldots \ast ). The latter form of comment may be nested but may not contain EOF.
Comments cannot cross file boundaries.
The keywords of cool are: \bf class, \bf else, \bf false, \bf fi, \bf if, \bf in, \bf inherits, \bf isvoid, \bf let, \bf loop, \bf pool, \bf then, \bf while, \bf case, \bf esac, \bf new, \bf of, \bf not, \bf true. Except for the constants \bf true and \bf false, keywords are case insensitive.
To conform to the rules for other objects, the first letter of \bf true and \bf false must be lowercase; the trailing letters may be upper or lower case. Thus \bf FALse is not a keyword but a type identifier.
White space consists of any sequence of the characters: blank (ascii 32), \tt\backslash n (newline, ascii 10), \tt\backslash f (form feed, ascii 12), \tt\backslash r (carriage return, ascii 13), \tt\backslash t (tab, ascii 9), \tt\backslash v (vertical tab, ascii 11).
Figure 1 provides a specification of Cool syntax. The specification is not in pure Backus-Naur Form (BNF); for convenience, we also use some regular expression notation. Specifically, A^{\ast} means zero or more A's in succession; A^+ means one or more A's. Items in square brackets [\ldots] are optional. Double brackets \lbrack\!\lbrack \, \rbrack\!\rbrack are not part of Cool; they are used in the grammar as a meta-symbol to show association of grammar symbols (e.g. a \lbrack\!\lbrack b c \rbrack\!\rbrack ^ {+} means a followed by one or more bc pairs).
The precedence of infix binary and prefix unary operations, from highest to lowest, is given by the following table:
. @ ~ isvoid * / + - <= < = not <-
All binary operations are left-associative, with the exception of assignment, which is right-associative, and the three comparison operations, which do not associate.
This section formally defines the type rules of Cool. The type rules define the type of every Cool expression in a given context. The context is the \it type\ environment, which describes the type of every unbound identifier appearing in an expression. The type environment is described in Section 12.1. Section 12.2 gives the type rules.
To a first approximation, type checking in Cool can be thought of as a bottom-up algorithm: the type of an expression e is computed from the (previously computed) types of e's subexpressions. For example, an integer \rm 1 has type \rm Int; there are no subexpressions in this case. As another example, if \tt e_n has type \tt X, then the expression \tt \{\ e_1; \ldots; e_n;\ \!\} has type \tt X.
A complication arises in the case of an expression \tt v, where \tt v is an object identifier. It is not possible to say what the type of \tt v is in a strictly bottom-up algorithm; we need to know the type declared for \tt v in the larger expression. Such a declaration must exist for every object identifier in valid Cool programs.
To capture information about the types of identifiers, we use a \it type\ environment. The environment consists of three parts: a method environment M, an object environment O, and the name of the current class in which the expression appears. The method environment and object environment are both functions (also called \it mappings). The object environment is a function of the form
O(v)=Twhich assigns the type T to object identifier v. The method environment is more complex; it is a function of the form
M(C,f)=(T_1,\ldots,T_{n-1},T_n)where C is a class name (a type), f is a method name, and t_1,\ldots,t_n are types. The tuple of types is the \it signature of the method. The interpretation of signatures is that in class C the method f has formal parameters of types (t_1,\ldots,t_{n-1})---in that order---and a return type t_n.
Two mappings are required instead of one because object names and method names do not clash---i.e., there may be a method and an object identifier of the same name.
The third component of the type environment is the name of the current class, which is needed for type rules involving \rm SELF\_TYPE.
Every expression e is type checked in a type environment; the subexpressions of e may be type checked in the same environment or, if e introduces a new object identifier, in a modified environment. For example, consider the expression
let c : Int <- 33 in ...
The \rm let expression introduces a new variable \rm c with type \rm Int. Let O be the object component of the type environment for the \rm let. Then the body of the \rm let is type checked in the object type environment
O[Int/c]where the notation O[T/c] is defined as follows:
\begin{eqnarray} O[T/c](c) &\ =\ & T \\ O[T/c](d) &=& O(d)\ \ {\rm if}\ d \neq c \end{eqnarray}The general form a type checking rule is:
\frac{\vdots}{O,M,C \vdash e : T}The rule should be read: In the type environment for objects O, methods M, and containing class C, the expression e has type T. The dots above the horizontal bar stand for other statements about the types of sub-expressions of e. These other statements are hypotheses of the rule; if the hypotheses are satisfied, then the statement below the bar is true. In the conclusion, the "turnstile" ("\vdash") separates context (O,M,C) from statement (e : T).
The rule for object identifiers is simply that if the environment assigns an identifier Id type T, then Id has type T.
\frac{O(Id) = T}{O,M,C \vdash Id : T}\mbox{[Var]}The rule for assignment to a variable is more complex:
\frac{ \begin{array}{l} O(Id) = T \\ O,M,C \vdash e_1 : T' \\ T' \le T \\ \end{array} }{O,M,C \vdash Id \leftarrow e_1 : T'}\mbox{[ASSIGN]}Note that this type rule--as well as others--use the conformance relation \leq (see Section 3.2). The rule says that the assigned expression e_1 must have a type T' that conforms to the type T of the identifier Id in the type environment. The type of the whole expression is T'. The type rules for constants are all easy:
\frac{}{O,M,C \vdash true:Bool}\mbox{[True]} \frac{}{O,M,C \vdash false:Bool}\mbox{[False]} \frac{i\ {\rm is\ an\ integer\ constant}}{O,M,C \vdash i:Int}\mbox{[Int]} \frac{s\ {\rm is\ an\ string\ constant}}{O,M,C \vdash s:String}\mbox{[String]}There are two cases for new, one for new SELF_TYPE and one for any other form:
\frac{ T' = \left\{ \begin{array}{rl} {\rm SELF\_TYPE_C} & {\rm if}\ T = {\rm SELF\_TYPE} \\ T & {\rm otherwise} \end{array} \right. }{O,M,C\vdash new\ T:T'}\mbox{[New]}Dispatch expressions are the most complex to type check.
\frac{ \begin{array}{l} O,M,C \vdash e_0:T_0 \\ O,M,C \vdash e_1:T_1 \\ \vdots \\ O,M,C \vdash e_n:T_n \\ T'_0 = \left\{ \begin{array}{rl} C & {\rm if}\ T_0 = {\rm SELF\_TYPE_C} \\ T_0 & {\rm otherwise} \end{array} \right. \\ M(T'_0,f)=(T'_1,\ldots,T'_n,T'_{n+1}) \\ T_i \le T'_i\quad 1 \le i \le n \\ T_{n+1} = \left\{ \begin{array}{rl} T_0 & {\rm if}\ T'_{n+1}={\rm SELF\_TYPE} \\ T'_{n+1} & {\rm otherwise} \end{array} \right. \end{array} }{O,M,C \vdash e_0.f(e_1,\ldots,e_n):T_{n+1}}\mbox{[Dispatch]} \frac{ \begin{array}{l} O,M,C \vdash e_0:T_0 \\ O,M,C \vdash e_1:T_1 \\ \vdots \\ O,M,C \vdash e_n:T_n \\ T_0 \le T \\ M(T,f)=(T'_1,\ldots,T'_n,T'_{n+1}) \\ T_i \le T'_i\quad 1 \le i \le n \\ T_{n+1} = \left\{ \begin{array}{rl} T_0 & {\rm if}\ T'_{n+1}={\rm SELF\_TYPE} \\ T'_{n+1} & {\rm otherwise} \end{array} \right. \end{array} }{O,M,C \vdash e_0@T.f(e_1,\ldots,e_n):T_{n+1}}\mbox{[Static Dispatch]}To type check a dispatch, each of the subexpressions must first be type checked. The type T_0 of e_0 determines which declaration of the method f is used. The argument types of the dispatch must conform to the declared argument types. Note that the type of the result of the dispatch is either the declared return type or T_0 in the case that the declared return type is \tt SELF\_TYPE. The only difference in type checking a static dispatch is that the class T of the method f is given in the dispatch, and the type T_0 must conform to T.
The type checking rules for \rm if and \rm \{ \HY \} expressions are straightforward. See Section 7.5 for the definition of the \sqcup operation.
\frac{\begin{array}{l} O,M,C \vdash e_1:Bool \\ O,M,C \vdash e_2:T_2 \\ O,M,C \vdash e_3:T_3 \end{array}} {O,M,C \vdash {\rm if}\ e_1\ {\rm then}\ e_2\ {\rm else}\ e_3\ {\rm fi}:T_2\sqcup T_3}\mbox{[If]} \frac{\begin{array}{l} O,M,C \vdash e_1:T_1 \\ O,M,C \vdash e_2:T_2 \\ \vdots \\ O,M,C \vdash e_n:T_n \end{array}} {O,M,C \vdash \{\ e_1;e_2;\ldots e_n;\ \}:T_n}\mbox{[Sequence]}The \rm let rule has some interesting aspects.
\frac{\begin{array}{l} T'_0 = \left\{ \begin{array}{rl} {\tt SELF\_TYPE_C} & {\rm if}\ T_0 = {\tt SELF\_TYPE} \\ T_0 & {\rm otherwise} \end{array} \right. \\ O,M,C \vdash e_1:T_1 \\ T_1 \le T'_0 \\ O[T'_0/x],M,C \vdash e_2:T_2 \end{array}} {O,M,C \vdash {\rm let}\ x: T_0 \leftarrow e_1\ in\ e_2:T_2}\mbox{[Let-Init]}First, the initialization e_1 is type checked in an environment without a new definition for x. Thus, the variable x cannot be used in e_1 unless it already has a definition in an outer scope. Second, the body e_2 is type checked in the environment O extended with the typing x:T_0'. Third, note that the type of x may be \rm SELF\_TYPE.
\frac{\begin{array}{l} T'_0 = \left\{ \begin{array}{rl} {\tt SELF\_TYPE_C} & {\rm if}\ T_0 = {\tt SELF\_TYPE} \\ T_0 & {\rm otherwise} \end{array} \right. \\ O[T'_0/x],M,C \vdash e_1:T_1 \end{array}} {O,M,C \vdash {\rm let}\ x:T_0\ {\rm in}\ e_1:T_1}\mbox{[Let-No-Init]}The rule for \rm let with no initialization simply omits the conformance requirement. We give type rules only for a \rm let with a single variable. Typing a multiple \rm let
\rm let\ x_1 : T_1\ [\leftarrow e_1],\ x_2: T_2\ [\leftarrow e_2], \ldots,\ x_n :T_n\ [\leftarrow e_n]\ in\ eis defined to be the same as typing
\rm let\ x_1 : T_1\ [\leftarrow e_1]\ in\ (let\ x_2 :T_2\ [\leftarrow e_2],\ldots,\ x_n : T_n\ [\leftarrow e_n]\ in\ e\ )\ \frac{\begin{array}{l} O,M,C \vdash e_0:T_0 \\ O[T_1/x_1],M,C \vdash e_1:T'_1 \\ \vdots \\ O[T_n/x_n],M,C \vdash e_n:T'_n \end{array}} {O,M,C \vdash {\rm case}\ e_0\ {\rm of}\ x_1:T_1 \Rightarrow e_1;,\ldots x_n:T_n \Rightarrow e_n;\ {\rm esac}:\sqcup_{1 \le i \le n} T'_i}\mbox{[Case]}Each branch of a \rm case is type checked in an environment where variable x_i has type T_i. The type of the entire \rm case is the join of the types of its branches. The variables declared on each branch of a \rm case must all have distinct types.
\frac{\begin{array}{l} O,M,C \vdash e_1:Bool \\ O,M,C \vdash e_2:T_2 \end{array}} {O,M,C \vdash {\rm while}\ e_1\ {\rm loop}\ e_2\ {\rm pool}:Object}\mbox{[Loop]}The predicate of a loop must have type Bool; the type of the entire loop is always Object. An \rm isvoid test has type Bool:
\frac{ O,M,C \vdash e_1:T_1 } {O,M,C \vdash {\rm isvoid}\ e_1:Bool}\mbox{[Isvoid]}With the exception of the rule for equality, the type checking rules for the primitive logical and arithmetic operations are easy.
\frac{ O,M,C \vdash e_1:Bool } {O,M,C \vdash \lnot e_1:Bool}\mbox{[Not]} \frac{ O,M,C \vdash e_1:Int } {O,M,C \vdash \sim e_1:Int}\mbox{[Neg]} \frac{\begin{array}{l} O,M,C \vdash e_1:Int \\ O,M,C \vdash e_2:Int \\ {\rm op} \in \{*,+,-,/\} \end{array}} {O,M,C \vdash e_1\ op\ e_2:Int}\mbox{[Arith]}The wrinkle in the rule for equality is that any types may be freely compared except \rm Int, \rm String and \rm Bool, which may only be compared with objects of the same type. The cases for \rm < and \rm <= are similar to the rule for equality.
\frac{\begin{array}{l} O,M,C \vdash e_1 : T_1 \\ O,M,C \vdash e_2 : T_2 \\ T_1 \in \{Int,String,Bool\} \lor T_2 \in \{Int,String,Bool\} \Rightarrow T_1 = T_2 \end{array}} {O,M,C \vdash e_1 = e_2 : Bool}\mbox{[Equal]}The final cases are type checking rules for attributes and methods. For a class C, let the object environment O_C give the types of all attributes of C (including any inherited attributes). More formally, if x is an attribute (inherited or not) of C, and the declaration of x is x:T, then
O_C(x) = \left\{ \begin{array}{rl} {\rm SELF\_TYPE_C} & {\rm if}\ T = {\rm SELF\_TYPE} \\ T & otherwise \end{array} \right.The method environment M is global to the entire program and defines for every class C the signatures of all of the methods of C (including any inherited methods).
The two rules for type checking attribute defininitions are similar the rules for \rm let. The essential difference is that attributes are visible within their initialization expressions. Note that \rm self is bound in the initialization.
\frac{\begin{array}{l} O_C(x) = T_0 \\ O_C [{\rm SELF\_TYPE}_C/self],M,C \vdash e_1 : T_1 \\ T_1 \le T_0 \end{array}} {O_C,M,C \vdash x : T_0 \leftarrow e_1; }\mbox{[Attr-Init]} \frac{ O_C(x) = T } {O_C,M,C \vdash x : T;}\mbox{[Attr-No-Init]}The rule for typing methods checks the body of the method in an environment where O_C is extended with bindings for the formal parameters and \rm self. The type of the method body must conform to the declared return type.
\frac{\begin{array}{l} M(C,f) = (T_1,\ldots,T_n,T_0) \\ O_C[{\rm SELF\_TYPE}_C/self][T_1/x_1]\ldots[T_n/x_n],M,C \vdash e : T'_0 \\ T'_0 \le \left\{ \begin{array}{rl} {\rm SELF\_TYPE}_C & {\rm if}\ T_0 = {\rm SELF\_TYPE} \\ T_0 & otherwise \end{array} \right. \end{array}} {O_C,M,C \vdash f(x_1 : T_1,\ldots,x_n : T_n):T_0 \{\ e\ \};}\mbox{[Method]}There are a number of semantic checks applied to Cool programs that are not captured by formal typing rules. For example, a Cool program cannot contain an inheritance cycle. Similarly, a Cool program cannot contain a class that inherits from \rm String. These rules are scattered through the \it Cool\ Reference\ Manual.
The order in which these other checks are performed is \it not\ specified. If a Cool program contains both an inheritance cycle and also a class that inherits from \rm String, the Cool compiler may report whichever error it prefers.
This section contains a mostly formal presentation of the operational semantics for the Cool language. The operational semantics define for every Cool expression what value it should produce in a given context. The context has three components: an environment, a store, and a self object. These components are described in the next section. Section 13.2 defines the syntax used to refer to Cool objects, and Section 13.3 defines the syntax used to refer to class definitions.
Keep in mind that a formal semantics is a specification only--it does not describe an implementation. The purpose of presenting the formal semantics is to make clear all the details of the behavior of Cool expressions. How this behavior is implemented is another matter.
Before we can present a semantics for Cool we need a number of concepts and a considerable amount of notation. An \it environment is a mapping of variable identifiers to \it locations. Intuitively, an environment tells us for a given identifier the address of the memory location where that identifier's value is stored. For a given expression, the environment must assign a location to all identifiers to which the expression may refer. For the expression, e.g., a + b, we need an environment that maps a to some location and b to some location. We'll use the following syntax to describe environments, which is very similar to the syntax of type assumptions used in Section 12.
E = [a : l_{1} , b : l_{2}]This environment maps a to location l_{1}, and b to location l_{2}.
The second component of the context for the evaluation of an expression is the \it store(memory). The store maps locations to values, where values in Cool are just objects. Intuitively, a store tells us what value is stored in a given memory location. For the moment, assume all values are integers. A store is similar to an environment:
S=[l_{1} \rightarrow 55, l_{2} \rightarrow 77]This store maps location l_{1} to value 55 and location l_{2} to value 77.
Given an environment and a store, the value of an identifier can be found by first looking up the location that the identifier maps to in the environment and then looking up the location in the store.
E(a) = l_{1} \\ S(l_{1}) = 55Together, the environment and the store define the execution state at a particular step of the evaluation of a Cool expression. The double indirection from identifiers to locations to values allows us to model variables. Consider what happens if the value 99 is assigned variable a in the environment and store defined above. Assigning to a variable means changing the value to which it refers but not its location. To perform the assignment, we look up the location for a in the environment E and then change the mapping for the obtained location to the new value, giving a new store S'.
\begin{eqnarray} E(a) &=& l_{1} \\ S' &=& S[99/l_{1}] \end{eqnarray}The syntax S[v/l] denotes a new store that is identical to the store S, except that S' maps location l to value v. For all locations l' where l'\not=l, we still have S'(l')=S(l').
The store models the contents of memory of the computer during program execution. Assigning to a variable modifies the store.
There are also situations in which the environment is modified. Consider the following Cool fragment:
let c : Int <- 33 in c
When evaluating this expression, we must introduce the new identifier c into the environment before evaluating the body of the \rm let. If the current environment and state are E and S, then we create a new environment E' and a new store S' defined by:
\begin{eqnarray} l_{c} &=& newloc(S) \\ E' &=& E[l_{c}/c] \\ S' &=& S[33/l_{c}] \end{eqnarray}The first step is to allocate a location for the variable c. The location should be fresh, meaning that the current store does not have a mapping for it. The function newloc() applied to a store gives us an unused location in that store. We then create a new environment E', which maps c to l_{c} but also contains all of the mappings of E for identifiers other than c. Note that if c already has a mapping in E, the new environment E' hides this old mapping. We must also update the store to map the new location to a value. In this case l_{c} maps to the value 33, which is the initial value for c as defined by the let-expression.
The example in this subsection oversimplifies Cool environments and stores a bit, because simple integers are not Cool values. Even integers are full-fledged objects in Cool.
Every Cool value is an object. Objects contain a list of named attributes, a bit like records in C. In addition, each object belongs to a class. We use the following syntax for values in Cool:
v = X(a_{1} = l_{1}, a_{2} = l_{2}, \dots , a_{n} = l_{n})Read the syntax as follows: The value v is a member of class X containing the attributes a_1, \ldots, a_n whose locations are l_1, \ldots, l_n. Note that the attributes have an associated location. Intuitively this means that there is some space in memory reserved for each attribute. The value v has dynamic type X.
For base objects of Cool (i.e., \tt Ints, \tt Strings, and \tt Bools) we use a special case of the above syntax. Base objects have a class name, but their attributes are not like attributes of normal classes, because they cannot be modified. Therefore, we describe base objects using the following syntax:
\begin{array}{l} Int(5) \\ Bool(true) \\ String(4,{\tt"Cool"}) \end{array}For \tt Ints and \tt Bools, the meaning is obvious. \tt Strings contain two parts, the length and the actual sequence of ASCII characters.
In the rules presented in the next section, we need a way to refer to the definitions of attributes and methods for classes. Suppose we have the following Cool class definition:
class B { s : String <- "Hello"; g (y:String) : Int { y.concat(s) }; f (x:Int) : Int { x+1 }; }; class A inherits B { a : Int; b : B <- new B; f(x:Int) : Int { x+a }; };
Two mappings, called \it class and \it implementation, are associated with \it class definitions. The class mapping is used to get the attributes, as well as their types and initializations, of a particular class:
class(a) = (s:String \leftarrow {\tt"Hello"}, a:Int \leftarrow O, b:B \leftarrow new\ B)Note that the information for class A contains everything that it inherited from class B, as well as its own definitions. If B had inherited other attributes, those attributes would also appear in the information for A. The attributes are listed in the order they are inherited and then in source order: all the attributes from the greatest ancestor are listed first in the order in which they textually appear, then the attributes of the next greatest ancestor, and so on, on down to the attributes defined in the particular class. We rely on this order in describing how new objects are initialized.
The general form of a class mapping is:
class(X) = (a_{1}:T_{1} \leftarrow e_{1}, \dots , a_{n}:T_{n} \leftarrow e_{n})Note that every attribute has an initializing expression, even if the Cool program does not specify one for each attribute. The \it default initialization for a variable or attribute is the default of its type. The default of \tt Int is \tt 0, the default of \rm String is \tt "", the default of \rm Bool is \rm false, and the default of any other type is \rm void. (5) The default of type T is written D_{T}.
The implementation mapping gives information about the methods of a class. For the above example, \it implementation of A is defined as follows:
\begin{eqnarray} implementation(A,f) &=& (x, x+a) \\ implementation(A,g) &=& (y, y.concat(s)) \end{eqnarray}In general, for a class X and a method m,
implementation(X,m) = (x_{1}, x_{2}, \dots , x_{n}, e_{body})specifies that method m when invoked from class X, has formal parameters x_{1}, \ldots, x_{n}, and the body of the method is expression e_{body}.
Equipped with environments, stores, objects, and class definitions, we can now attack the operational semantics for Cool. The operational semantics is described by rules similar to the rules used in type checking. The general form of the rules is:
\frac{ \vdots } { so,S,E \vdash e_{1} : v,S' }The rule should be read as: In the context where \it self is the object so, the store is S, and the environment is E, the expression e_{1} evaluates to object v and the new store is S'. The dots above the horizontal bar stand for other statements about the evaluation of sub-expressions of e_{1}.
Besides an environment and a store, the evaluation context contains a self object so. The self object is just the object to which the identifier \tt self refers if \tt self appears in the expression. We do not place \tt self in the environment and store because \tt self is not a variable--it cannot be assigned to. Note that the rules specify a new store after the evaluation of an expression. The new store contains all changes to memory resulting as side effects of evaluating expression e_{1}.
The rest of this section presents and briefly discusses each of the operational rules. A few cases are not covered; these are discussed at the end of the section.
\frac{\begin{array}{l} so,S_{1},E \vdash e_{1}:v_{1}, S_{2} \\ E(Id) = l_{1} \\ S_{3} = S_{2}[v_{1}/l_{1}] \end{array}} {so,S_{1},E \vdash Id \leftarrow e_{1} : v_{1}, S_{3}}\mbox{[Assign]}An assignment first evaluates the expression on the right-hand side, yielding a value v_{1}. This value is stored in memory at the address for the identifier.
The rules for identifier references, self, and constants are straightforward:
\frac{\begin{array}{l} E(Id) = l \\ S(l) = v \end{array}} {so, S, E \vdash Id : v, S}\mbox{[Var]} \frac{\begin{array}{l} \\ \end{array}} {so, S, E \vdash self : so, S}\mbox{[Self]} \frac{\begin{array}{l} \\ \end{array}} {so, S, E \vdash true : Bool(true), S}\mbox{[True]} \frac{\begin{array}{l} \\ \end{array}} {so, S, E \vdash false : Bool(false), S}\mbox{[False]} \frac{\begin{array}{l} i \ is \ an \ integer \ constant \end{array}} {so, S, E \vdash i : Int(i), S}\mbox{[Int]} \frac{\begin{array}{l} s \ is \ a \ string \ constant \\ l = length(s) \end{array}} {so, S, E \vdash s : String(l,s), S}\mbox{[String]} \frac{ \begin{array}{l} T_0 = \left\{ \begin{array}{rl} X & {\rm if}\ T = {\rm SELF\_TYPE \ and \ so = X(\dots)} \\ T & {\rm otherwise} \end{array} \right. \\ class(T_{0}) = (a_{1} : T_{1} \leftarrow e_{1} , \dots , a_{n} : T_{n} \leftarrow e_{n}) \\ l_{i} = newloc(S_{1}), for \ i = 1 \dots n \ and \ each \ l_{i} \ is \ distinct \\ v_{1} = T_{0}(a_{1} = l_{1}, \dots , a_{n} = l_{n}) \\ S_{2} = S_{1}[D_{T_{1}}/l_{1}, \dots , D_{T_{n}}/l_{n}] \\ v_{1}, S_{2}, [a_{1} : l_{1}, \dots , a_{n} : l_{n}] \vdash {a_{1} \leftarrow e_{1}; \dots ; a_{n} \leftarrow e_{n};} : v_{2}, S_{3} \end{array} }{so, S_{1}, E \vdash new \ T : v_{1}, S_{3}}\mbox{[New]}The tricky thing in a \rm new expression is to initialize the attributes in the right order. If an attribute does not have an initializer, \it do \ not evaluate an assignment expression for it in the final step. Note also that, during initialization, attributes are bound to the default of the appropriate class.
\frac{\begin{array}{l} so, S_{1}, E \vdash e_{1} : v_{1}, S_{2} \\ so, S_{2}, E \vdash e_{2} : v_{2}, S_{3} \\ \vdots \\ so, S_{n}, E \vdash e_{n} : v_{n}, S_{n+1} \\ so, S_{n+1}, E \vdash e_{0} : v_{0}, S_{n+2} \\ v_{0} = X(a_{1} = l_{a_{1}} , \dots , a_{m} = l_{a_{m}}) \\ implementation(X,f) = (x_{1}, \dots , x_{n}, e_{n+1}) \\ l_{x_{i}} = newloc(S_{n+2}), \ for \ i = 1 \dots n \ and \ each \ l_{x_{i}} \ is \ distinct \\ S_{n+3} = S_{n+2}[v_{1}/l_{x_{1}} , \dots , v_{n}/l_{x_{n}}] \\ v_{0}, S_{n+3}, [a_{1} : l_{a_{1}}, \dots , a_{m} : l_{a_{m}}, x_{1} : l_{x_{1}} , \dots , x_{n} : l_{x_{n}}] \vdash e_{n+1} : v_{n+1} , S_{n+4} \end{array}} {so, S_{1}, E \vdash e_{0}.f(e_{1}, \dots , e_{n}) : v_{n+1}, S_{n+4}}\mbox{[Dispatch]} \frac{\begin{array}{l} so, S_{1}, E \vdash e_{1} : v_{1}, S_{2} \\ so, S_{2}, E \vdash e_{2} : v_{2}, S_{3} \\ \vdots \\ so, S_{n}, E \vdash e_{n} : v_{n}, S_{n+1} \\ so, S_{n+1}, E \vdash e_{0} : v_{0}, S_{n+2} \\ v_{0} = X(a_{1} = l_{a_{1}} , \dots , a_{m} = l_{a_{m}}) \\ implementation(T,f) = (x_{1}, \dots , x_{n}, e_{n+1}) \\ l_{x_{i}} = newloc(S_{n+2}), \ for \ i = 1 \dots n \ and \ each \ l_{x_{i}} \ is \ distinct \\ S_{n+3} = S_{n+2}[v_{1}/l_{x_{1}} , \dots , v_{n}/l_{x_{n}}] \\ v_{0}, S_{n+3}, [a_{1} : l_{a_{1}}, \dots , a_{m} : l_{a_{m}}, x_{1} : l_{x_{1}} , \dots , x_{n} : l_{x_{n}}] \vdash e_{n+1} : v_{n+1} , S_{n+4} \end{array}} {so, S_{1}, E \vdash e_{0}@T.f(e_{1}, \dots , e_{n}) : v_{n+1}, S_{n+4}}\mbox{[Static Dispatch]}The two dispatch rules do what one would expect. The arguments are evaluated and saved. Next, the expression on the left-hand side of the \rm \LQT . \RQT is evaluated. In a normal dispatch, the class of this expression is used to determine the method to invoke; otherwise the class is specified in the dispatch itself.
\frac{\begin{array}{l} so, S_{1}, E \vdash e_{1} : Bool(true), S_{2} \\ so, S_{2}, E \vdash e_{2} : v_{2}, S_{3} \end{array}} {so, S_{1}, E \vdash if \ e_{1} \ then \ e_{2} \ else \ e_{3} \ fi : v_{2}, S_{3}}\mbox{[If-True]} \frac{\begin{array}{l} so, S_{1}, E \vdash e_{1} : Bool(false), S_{2} \\ so, S_{2}, E \vdash e_{3} : v_{3}, S_{3} \end{array}} {so, S_{1}, E \vdash if \ e_{1} \ then \ e_{2} \ else \ e_{3} \ fi : v_{3}, S_{3}}\mbox{[If-False]}There are no surprises in the if-then-else rules. Note that value of the predicate is a \tt Bool object, not a boolean.
\frac{\begin{array}{l} so, S_{1}, E \vdash e_{1} : v_{1}, S_{2} \\ so, S_{2}, E \vdash e_{2} : v_{2}, S_{3} \\ \vdots \\ so, S_{n}, E \vdash e_{n} : v_{n}, S_{n+1} \end{array}} {so, S_{1}, E \vdash \{ e_{1} ; e_{2} ; \dots ; e_{n} ; \} : v_{n}, S_{n+1}}\mbox{[Sequence]}Blocks are evaluated from the first expression to the last expression, in order. The result is the result of the last expression.
\frac{\begin{array}{l} so, S_{1}, E \vdash e_{1} : v_{1}, S_{2} \\ l_{1} = newloc(S_{2}) \\ S_{3} = S_{2}[v_{1}/l_{1}] \\ E' = E[l_{1}/Id] \\ so, S_{3}, E' \vdash e_{2} : v_{2}, S_{4} \end{array}} {so, S_{1}, E \vdash let \ Id : T_{1} \leftarrow e_{1} \ in \ e_{2} : v_{2}, S_{4}}\mbox{[Let]}A \tt let evaluates any initialization code, assigns the result to the variable at a fresh location, and evaluates the body of the \tt let. (If there is no initialization, the variable is initialized to the default value of T_1.) We give the operational semantics only for the case of \tt let with a single variable. The semantics of a multiple \tt let
let \ x_{1} : T_{1} \leftarrow e_{1}, x_{2} : T_{2} \leftarrow e_{2} , \dots , x_{n} : T_{n} \leftarrow e_{n} \ in \ eis defined to be the same as
let \ x_{1} : T_{1} \leftarrow e_{1} \ in \ (let \ x_{2} : T_{2} \leftarrow e_{2} , \dots , x_{n} : T_{n} \leftarrow e_{n} \ in \ e) \frac{\begin{array}{l} so, S_{1}, E \vdash e_{0} : v_{0} , S_{2} \\ v_{0} = X(\dots) \\ T_{i} = closest \ ancestor \ of \ X \ in \ \{ T_{1}, \dots , T_{n} \} \\ l_{0} = newloc(S_{2}) \\ S_{3} = S_{2}[v_{0}/l_{0}] \\ E' = E[l_{0}/Id_{i}] \\ so, S_{3}, E' \vdash e_{i} : v_{1}, S_{4} \end{array}} {so, S_{1}, E \vdash case \ e_{0} \ of \ Id_{1} : T_{1} \Rightarrow e_{1} ; \dots ; Id_{n} : T_{n} \Rightarrow e_{n} ; esac : v_{1}, S_{4}}\mbox{[Case]}Note that the \tt case rule requires that the class hierarchy be available in some form at runtime, so that the correct branch of the \tt case can be selected. This rule is otherwise straightforward.
\frac{\begin{array}{l} so, S_{1}, E \vdash e_{1} : Bool(true), S_{2} \\ so, S_{2}, E \vdash e_{2} : v_{2}, S_{3} \\ so, S_{3}, E \vdash while \ e_{1} \ loop \ e_{2} \ pool : void, S_{4} \end{array}} {so, S_{1}, E \vdash while \ e_{1} \ loop \ e_{2} \ pool : void, S_{4}}\mbox{[Loop-True]} \frac{\begin{array}{l} so, S_{1}, E \vdash e_{1} : Bool(false), S_{2} \end{array}} {so, S_{1}, E \vdash while \ e_{1} \ loop \ e_{2} \ pool : void, S_{2}}\mbox{[Loop-False]}There are two rules for \tt while: one for the case where the predicate is \tt true and one for the case where the predicate is \tt false. Both cases are straightforward. The two rules for \tt isvoid are also straightforward:
\frac{\begin{array}{l} so, S_{1}, E \vdash e_{1} : void, S_{2} \end{array}} {so, S_{1}, E \vdash isvoid \ e_{1} : Bool(true), S_{2}}\mbox{[IsVoid-True]} \frac{\begin{array}{l} so, S_{1}, E \vdash e_{1} : X(\dots), S_{2} \end{array}} {so, S_{1}, E \vdash isvoid \ e_{1} : Bool(false), S_{2}}\mbox{[IsVoid-False]}The remainder of the rules are for the primitive arithmetic and logical operations. These are all easy rules.
\frac{\begin{array}{l} so, S_{1}, E \vdash e_{1} : Bool(b), S_{2} \\ v_{1} = Bool(\neg b) \end{array}} {so, S_{1}, E \vdash not \ e_{1} : v_{1}, S_{2}}\mbox{[Not]} \frac{\begin{array}{l} so, S_{1}, E \vdash e_{1} : Int(i_{1}), S_{2} \\ v_{1} = Int(-i_{1}) \end{array}} {so, S_{1}, E \vdash \verb|~| e_{1} : v_{1}, S_{2}}\mbox{[Neg]} \frac{\begin{array}{l} so, S_{1}, E \vdash e_{1} : Int(i_{1}), S_{2} \\ so, S_{2}, E \vdash e_{2} : Int(i_{2}), S_{3} \\ op \in \{ *, +, -, / \} \\ v_{1} = Int(i_{1} \ op \ i_{2}) \end{array}} {so, S_{1}, E \vdash e_{1} \ op \ e_{2} : v_{1} , S_{3}}\mbox{[Arith]}Cool \tt Ints are 32-bit two's complement signed integers; the arithmetic operations are defined accordingly.
The notation and rules given above are not powerful enough to describe how objects are tested for equality, or how runtime exceptions are handled. For these cases we resort to an English description.
In e_{1} = e_{2}, first e_{1} is evaluated and then e_{2} is evaluated. The two objects are compared for equality by first comparing their pointers (addresses). If they are the same, the objects are equal. The value \tt void is not equal to any object except itself. If the two objects are of type \tt String, \tt Bool, or \tt Int, their respective contents are compared. \tt < and \tt <= are handled similarly. The case for integer arguments is simple:
\frac{ \begin{array}{l} so, S_{1}, E \vdash e_{1} : Int(i_{1}), S_{2} \\ so, S_{2}, E \vdash e_{2} : Int(i_{2}), S_{3} \\ op \in \{ \le , \lt \} \\ v_{1} = \left\{ \begin{array}{ll} Bool(true), & {\rm if}\ i_{1} \ op \ i_{2} \\ Bool(false), & {\rm otherwise} \end{array} \right. \end{array}} {so, S_{1}, E \vdash e_{1} \ op \ e_{2} : v_{1} , S_{3}}\mbox{[Comp]}... but \tt String and \tt Bool also admit comparisons. String comparisons are performed using the standard ASCII string ordering (e.g., \tt "abc" < "xyz"). For booleans, \tt false is defined to be less than \tt true. Any other comparison (e.g., a comparison among non-void objects of different types) returns \tt false. Note that for some objects this may be unintuitive: if \tt c is a \tt Cat and \tt d is a \tt Dog then \tt c < d is \tt false but \tt d < c is also \tt false. Note also that comparison is based on the dynamic type of the object, not on the static type of the object.
In addition, the operational rules do not specify what happens in the event of a runtime error. When a runtime error occurs, output is flushed and execution aborts. The following list specifies all possible runtime errors.
Each outstanding "method invocation" (static or dynamic) and each outstanding "new" object allocation expression counts as a "Cool Activation Record". (Just to be clear, that second clause about "new" is counting currently-resolving constructor calls, not "total objects living in the heap".) A Cool interpreter \it must flag a "stack overflow" runtime error if and only if there are \textbf{1000} (one thousand) or more outstanding Cool Activation Records.
Finally, the rules given above do not explain the execution behaviour for dispatches to primitive methods defined in the \tt Object, \tt IO, or \tt String classes. Descriptions of these primitive methods are given in Sections 8.3-8.5.
Cool Assembly Language is a simplified RISC-style assembly language that is reminiscent of MIPS Assembly Language crossed with x86 Assembly Language. It also features typing aspects that may remind one of Java Bytecode.
A Cool Assembly Language \bf program is a list of \bf instructions. Each instruction may be preceded by any number of \bf labels. Comments follow the standard Cool conventions. In addition, a semicolon \rm ; functions like a double dash \rm -- in that it marks the rest of that line as a comment. The Cool CPU is a load-store architecture with eight general purpose registers and three special-purpose registers. For simplicity, a machine word can hold either a 32-bit integer value or an entire raw string; regardless, all machine words have size one.
This document assumes that you already have some familiarity with assembly language, registers, and how CPUs operate. We first present a formal grammar and then explain the semantics. Only terms in \rm typewriter font are part of the formal grammar. Text after \text{--} is a comment. We use \it italics for non-terminals.
\begin{eqnarray} {\it register}\ &::=&\ {\rm r0} \text{ -- general purpose register #0, often used as the } {\it accumulator} \\ {\it register}\ &::=&\ {\rm r1} \text{ -- general purpose register #1} \\ {\it register}\ &::=&\ {\rm r2} \\ {\it register}\ &::=&\ {\rm r3} \\ {\it register}\ &::=&\ {\rm r4} \\ {\it register}\ &::=&\ {\rm r5} \\ {\it register}\ &::=&\ {\rm r6} \\ {\it register}\ &::=&\ {\rm r7} \\ {\it register}\ &::=&\ {\rm sp} \text{ -- stack pointer register} \\ {\it register}\ &::=&\ {\rm fp} \text{ -- frame pointer register} \\ {\it register}\ &::=&\ {\rm ra} \text{ -- return address pointer} \\ \end{eqnarray} \begin{eqnarray} {\it instruction}\ &::=&\ {\rm li}\ {\it register} \text{ -- load immediate} \\ {\it instruction}\ &::=&\ {\rm mov}\ {\it register}\ < \!\!\!-\ {\it register} \text{ -- register-to-register copy} \\ {\it instruction}\ &::=&\ {\rm add}\ {\it register}\ < \!\!\!-\ {\it register}\ {\it register} \\ {\it instruction}\ &::=&\ {\rm sub}\ {\it register}\ < \!\!\!-\ {\it register}\ {\it register} \\ {\it instruction}\ &::=&\ {\rm mul}\ {\it register}\ < \!\!\!-\ {\it register}\ {\it register} \\ {\it instruction}\ &::=&\ {\rm div}\ {\it register}\ < \!\!\!-\ {\it register}\ {\it register} \\ \\ {\it instruction}\ &::=&\ {\rm jmp}\ {\it label} \text{ -- unconditional branch} \\ {\it instruction}\ &::=&\ {\rm bz}\ {\it register}\ {\it label} \text{ -- branch if equal to zero} \\ {\it instruction}\ &::=&\ {\rm bnz}\ {\it register}\ {\it label} \text{ -- branch if not zero} \\ {\it instruction}\ &::=&\ {\rm beq}\ {\it register}\ {\it register}\ {\it label} \text{ -- branch if equal} \\ {\it instruction}\ &::=&\ {\rm blt}\ {\it register}\ {\it register}\ {\it label} \text{ -- branch if less than} \\ {\it instruction}\ &::=&\ {\rm ble}\ {\it register}\ {\it register}\ {\it label} \text{ -- branch if less than or equal to} \\ {\it instruction}\ &::=&\ {\rm call}\ {\it label} \text{ -- direct function call} \\ {\it instruction}\ &::=&\ {\rm call}\ {\it register} \text{ -- register-indirect function call} \\ {\it instruction}\ &::=&\ {\rm return} \text{ -- function return} \\ \\ {\it instruction}\ &::=&\ {\rm push}\ {\it register} \text{ -- push a value on the stack} \\ {\it instruction}\ &::=&\ {\rm pop}\ {\it register} \text{ -- push a value off the stack} \\ {\it instruction}\ &::=&\ {\rm ld}\ {\it register}\ < \!\!\!-\ {\it register}\ {\rm [}\ {\it integer}\ {\rm ]} \text{ -- load a value from memory} \\ {\it instruction}\ &::=&\ {\rm st}\ {\it register}\ {\rm [}\ {\it integer}\ {\rm ]}\ < \!\!\!-\ {\it register} \text{ -- store a value into memory} \\ {\it instruction}\ &::=&\ {\rm la}\ {\it register}\ < \!\!\!-\ {\it label} \text{ -- load an address into a register} \\ \\ {\it instruction}\ &::=&\ {\rm alloc}\ {\it register}\ {\it register} \text{ -- allocate memory} \\ {\it instruction}\ &::=&\ {\rm constant}\ {\it integer} \text{ -- lay out a compile-time constant in memory} \\ {\it instruction}\ &::=&\ {\rm constant}\ {\it raw\_string} \text{ -- lay out a compile-time constant in memory} \\ {\it instruction}\ &::=&\ {\rm constant}\ {\it label} \text{ -- lay out a compile-time constant in memory} \\ \\ {\it instruction}\ &::=&\ {\rm syscall}\ {\it name} \text{ -- request a service from the run-time system} \\ \\ {\it instruction}\ &::=&\ {\rm debug}\ {\it register} \text{ -- debugging support: print register value} \\ {\it instruction}\ &::=&\ {\rm trace} \text{ -- toggle tracing} \\ \end{eqnarray}That's it, and the last two do not really count. We next describe the interpretation of these instructions in more detail.
The system calls available are:
That system calls correspond directly to internal predefined methods on Cool Int and String objects. The key difference is that the system calls work on raw values (i.e., machine-level ints and strings) and not on Cool Objects.
The normal Cool compiler executable (e.g., \rm cool.exe) also serves as a Cool CPU Simulator that executes Cool Assembly Language programs. Just pass \rm file.cl-asm as an argument.
The simulator performs the following actions:
The constant values listed above (1000; 20,000; 2,000,000,000) should not be counted on by your program, but are listed here to help with debugging. Addresses near 1000 hold program instructions or compile-time data (i.e., the code segment), addresses near 20,000 hold the heap, and addresses near two billion are on the stack.
Debugging assembly language programs is notoriously difficult! While writing your code generator, you will spend quite a bit of time running generated Cool Assembly programs through the Cool CPU Simulator to see if they work. Often they will not. The Cool CPU Simulator has been designed with a large number of features to aid debugging. Basically none of these features are present in traditional assemblers, so you actually have a wealth of debugging support, but it will still be difficult.
The Cool reference compiler also includes options to produce control flow graphic visualizations in the style of the dotty tool from the Graphviz toolkit.
Passing the \rm --cfg option (with, for example, \rm --opt\ --asm) produces \it method\rm .dot, which can then be inspected via a number of tools. For example, this program:
class Main { main():Object { if (isvoid self) then (new IO).out_string("cannot happen!\n") else (new IO).out_string("hello, world!\n") fi }; };
Might produce this control-flow graph:
While you do not have to match the reference compiler exactly, inspecting its control-flow graphs can help you debug your own code to create control-flow graphs.
As discussed above, the Cool reference compiler also includes a reference machine simulator to interpret Cool Assembly Language instructions. This simulator can be invoked directly by passing a \rm .cl \HY asm file to \rm cool.exe:
cool$ cat hello-world.cl class Main { main():Object { (new IO).out_string("hello, world!\n") }; }; cool$ ./cool --asm hello-world.cl cool$ ./cool hello-world.cl-asm hello, world!
The simulator can also give detailed performance information:
cool$ ./cool --profile hello-world.cl-asm hello, world! PROFILE: instructions = 107 @ 1 => 107 PROFILE: pushes and pops = 29 @ 1 => 29 PROFILE: cache hits = 22 @ 0 => 0 PROFILE: cache misses = 570 @ 100 => 57000 PROFILE: branch predictions = 0 @ 0 => 0 PROFILE: branch mispredictions = 11 @ 20 => 220 PROFILE: multiplications = 0 @ 10 => 0 PROFILE: divisions = 0 @ 40 => 0 PROFILE: system calls = 2 @ 1000 => 2000 CYCLES: 59356
The execution time of a Cool Assembly Language program is measured in simulated instruction cycles. In general, each assembly instruction takes one cycle. Some instructions, such as system calls or memory operation, can cost many more cycles. The total cycle cost of a program is the sum of all of its component cycle costs.
In modern architectures, memory hierarchy effects (e.g., caching) and branch prediction are dominant factors in the execution speed of a program. To give you a flavor for what real-world code optimization is like, the Cool Simulator also simulates a cache and a branch predictor.
The Cool Simulator features a 64-word least-recently-used fully associative combined instruction and data cache. It also uses a static backward = taken, forward = not taken branch prediction scheme.
We now discuss each of the performance components in turn:
This cost model involves realistic components but potentially unrealistic values (e.g., a modern CPU would have a much larger non-associative cache, and also a much larger cache miss cost). If you're interested in that sort of performance modeling, take a graduate class in computer architecture. You should know that this CPU performance model is one of the most realistic that I've seen for a compiler optimization project in terms of the issues that it forces you to address.
The reference compiler includes a simple reference peephole optimizer, as well as a few optimizations backed by dataflow analyses (liveness, reaching definitions, constant folding) and register allocation enabled via the \rm --opt flag. You can use it to get an idea for how to get started (but note that we are evil and strip all comments from the optimized output).
yuki:~/src/cool$ ./cool --opt --asm hello-world.cl yuki:~/src/cool$ ./cool --profile hello-world.cl-asm hello, world! PROFILE: instructions = 79 @ 1 => 79 PROFILE: pushes and pops = 23 @ 1 => 23 PROFILE: cache hits = 15 @ 0 => 0 PROFILE: cache misses = 513 @ 100 => 51300 PROFILE: branch predictions = 2 @ 0 => 0 PROFILE: branch mispredictions = 7 @ 20 => 140 PROFILE: multiplications = 0 @ 10 => 0 PROFILE: divisions = 0 @ 40 => 0 PROFILE: system calls = 2 @ 1000 => 2000 CYCLES: 53542
For the \rm hello-world program, this optimizer reduces the cycle cost from 59356 to 53453 -- a 10% improvement. If you are writing an optimizer, you will want to do at least as well as the reference, averaged over many input programs. Notably, you'll probably want to implement much more than the required dead code elimination optimization.
Cool is based on Sather164, which is itself based on the language Sather. Portions of this document were cribbed from the Sather164 manual; in turn, portions of the Sather164 manual are based on Sather documentation written by Stephen M. Omohundro.
A number people have contributed to the design and implementation of Cool, including Manuel Fähndrich, David Gay, Douglas Hauge, Megan Jacoby, Tendo Kayiira, Carleton Miyamoto, and Michael Stoddart. Joe Darcy updated Cool to the current version.
This version (used in Virginia Programming Language Design and Implementation courses) of Cool owes a great debt to George C. Necula and Bor-Yuh Evan Chang.